Corpus Linguistics in Finland: a Resource Survey*

نویسنده

  • M. V. Kopotev
چکیده

Finnish corpus linguistics and computer linguistics generally has an ancient tradition, which gives it authority in the world community and has produced solid results in various areas. The first projects for electronic corpuses appeared in the 1960s, as in many other countries [1, 2]. From the start, this line in Finland was closely related to the writing of original computer programs for processing text, as well as close international links and devotion to current topics in lexicography and grammatical description ([3-6]; see also the round-table material on corpus linguistics in "Korpuslingvistiikan työpaja l: Korpukset ja ohjelmat", pp. 126-134 of [7]). The major feature of computer linguistics in Finland has become the close connection with the writing of end-user products, which has included collaboration with commercial firms [8; 1, pp. 50-54 and 62-64]. This paper is of information type and has particular purposes such as giving Russian linguists a conception of the main computer linguistic resources in Finland and determining the scope for them to use them. Each existing corpus is indicated as regards position at the present time, which is reduced in some cases to indicating the place of creation and initial storage. The characteristics of each are indicated by listing the places of detailed description (on the Internet and/or as a paper publication), in which full information can be obtained. Many of the resources described below provide remote access to the files (most of the servers work under the control of the Unix OS, which in general involves the user's machine having Unix-Client, e.g., the program FSecure SSH-Client). I do not discuss in detail the technical and organizational aspects of access and merely state that almost all of them are accessible for free use for research and teaching purposes. In most cases, this requires one to obtain permission from the administrator or owner of the corpus. Contact information is given on the corresponding Internet sites or in articles on the topic. The following comment is important. We are concerned with a definition of the corpus content. There are multiple meanings or uncertain use of this term, which lead to some general tendency for the name electronic corpus to be given to any collection of texts put into digital format. On the other hand, recently the term corpus has increasingly been used not simply for text (English running text) but linguistic material especially selected on ceratin principles. "So a corpus in modern linguistics, in contrast to being simply any body of text, might more accurately be described as a finite-sized body of machine-readable text, sampled in order to be maximally representative of the language variety under consideration" [9, p. 24]. However, in spite of the expansion of the new approach, old corpuses (i.e., simply electronic texts) still retain their linguistic value in many areas. This is dependent on the substantial differences in quantity and quality of the work done. For example, the last number of text collections in English poses substantially more complicated tasks (various types of annotation, parallel corpuses, speech records presented in electronic form, and so on). On the other hand, in many modern languages there are as yet no simple well-balanced representative corpuses, quite apart from annotated ones. Special and equally difficult problems arise for the creation of any corpus of ancient texts. The

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Corpus-Based Study of the Lexical Make-up of Applied Linguistics Article Abstracts

This paper reports results from a corpus-based study that explored the frequency of words in the abstracts of applied linguistics journal articles. The abstracts of major articles in leading applied linguists journals, published since 2005 up to November 2001 were analyzed using software modules from the Compleat Lexical Tutor. The output includes a list of the most frequent content words, list...

متن کامل

Do We Need Discipline-Specific Academic Word Lists? Linguistics Academic Word List (LAWL)

This corpus-based study aimed at exploring the most frequently-used academic words in linguistics and compare the wordlist with the distribution of high frequency words in Coxhead’s Academic Word List (AWL) and West’s General Service List (GSL) to examine their coverage within the linguistics corpus. To this end, a corpus of 700 linguistics research articles (LRAC), consisting of approximately ...

متن کامل

A Functional Investigation of Self-mention in Soft Science Master Theses

This study is a quantitative and functional corpus-based study of self-mention in soft science Master theses. One important purpose of this study was to find out the functions of self-mention in soft science Master theses. For this purpose, 20 soft science Master theses in four disciplines (Applied linguistics, Psychology, Geography, and Political sciences), were randomly selected out of the li...

متن کامل

Web Access to Corpora: the W3Corpora Project

In this day an age, some corpus linguistics should be par t of every course to do with language. But learning about corpus linguistics its possibilities a n d limitations is not just a mat te r of acquiring information. The best way to learn about corpus linguistics is to do it, and the best way to teach corpus linguistics is to put students into a position where they can do it ((Leech, 1997), ...

متن کامل

Concordance-Based Data-Driven Learning Activities and Learning English Phrasal Verbs in EFL Classrooms

In spite of the highly beneficial applications of corpus linguistics in language pedagogy, it has not found its way into mainstream EFL. The major reasons seem to be the teachers’ lack of training and the unavailability of resources, especially computers in language classes. Phrasal verbs have been shown to be a problematic area of learning English as a foreign language due to their semantic op...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007